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ABSTR?ICT 

The empirical model builder utilizing regression tech- 
niques frequently relies on the coefficient of determination, 
2 

R , to measure 'goodness of fit'. Costing and pricing ana- 
lysts using such variable selection techniques frequently 

2 

encounter inflated R values. This paper examines the space 
within which the regression model operates and presents prac- 
tical optimization algorithms to help assess 'the amount of 

2 

confidence that can be placed in R for a particular set of 

candidate predictor variables. The algorithms describe a 

technique using linear programming to find the lowest value 
2 

of R possible using the given set of data. 
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INTRODUCTION 



I . 

A. PROBLEM MOTIVATION 

Parametric cost estimation is a management tool used to 
aid in the prediction of the cost of a proposed system. It 
involves predicting the cost (dependent variable) of a system 
by means of explanatory (independent) variables such as sys- 
tem characteristics or performance requirements. This proce- 
dure is based on the premise that the cost of a system is 
related in a quantifiable way to the system's physical and 
performance characteristics. The expression of this quanti- 
fiable relationship is in the form of an estimating equation 
derived through statistical regression analysis of historical 
cost data on systems which are, more-or-less , analogous to 
the proposed system. Since parametric cost estimates can be 
developed during the concept formulation stage of the acqui- 
sition process before engineering plans are finalized, these 
estimates can be used by management to: 

(1) Identify possible cost/performance tradeoffs in the 
design effort. 

(2) Provide a basis for cost/effectiveness review of 
performance specifications . 

(3) Provide information useful in the ranking of 
competing alternatives. 

(4) Suggest a need for investigating new alternatives. 

Cost overruns have been prevalent in the acquisition 

process for new weapon systems making cost estimation a very 



important problem for all components of the Department of 
Defense . 

To combat this problem, the Department of Defense has 
issued directives to employ independent parametric cost 
estimation. Publications such as Reference [1] have appeared 
which give step by step methodology for the development of 
a parametric cost estimate. 

Regression problems faced by costing and pricing analysts 
in these situations are inherently difficult for two fundamen^ 
tal reasons [Ref. 2] ; 

(1) The number of observations is usually small compared 
with the number of system characteristics which are 
candidate components of the regression equation. 

(2) The available data is not produced by employing an 
efficient experimental design. 

Under these circumstances, it has been shown that the use 

of variable selection techniques may result in regression 

2 

equations which yield inflated R values whose statistical 

significance cannot be tested using the F-test.^ 

In general parametric cost estimation, an analyst should 

not blindly trust the regression equation resulting from his 

analysis. To measure the 'goodness of fit', the analyst can 

2 

use such statistics as R and F. (As noted earlier, however, 
regression models for cost estimation often do not allow use 
of F.) There are few hard and fast rules for assessing the 



1 2 

This IS the case when R is not significant, and some, 
but not all, of the 3^ are significant. 
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usefulness of such a model. This is especially true of models 
that result from the application of a variable selection tech- 
nique in order to obtain a 'best' prediction equation. The 
2 . 

R statistic in these situations may not give a meaningful 

indication of the model's applicability. 

The purpose of this paper is to investigate the coeffi- 

2 

cient of determination, R , used in best subset regression 

analysis. The solution algorithms presented in this paper 

provide a practical method to help assess the confidence 

2 

placed by the empirical model builder in R for a regression 
upon a particular set of exogenous data. It may contribute to 
the theoretical foundation for the understanding of regression 
models, whose properties are not fully understood. 

B. STATEMENT OF THE PROBLEM 

Suppose the analyst selects n-independent observations on 
p-predictor (candidate) variables and one dependent variable. 
The goal of the analysis is to determine the k-variable re- 
gression equation which maximizes the coefficient of deter- 
mination for various values of k. The difficulty with this 

2 

analysis is assessing the statistical significance of R for 
a given value of k. 

The n-independent observations, mathematically, span a 
n-dimensional finite vector space (call it E^) . The regres- 
sion procedure projects the dependent variable, Y, onto sub- 
spaces within looking for the best fit (prediction) . How 
the subspaces are oriented in E^ therefore dictates the quality 
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of fit that can be obtained. The subspaces for are deter- 
mined by the candidate predictor variables; each predictor 
variable's observed values representing a vector in and 

each combinatorial set of the p-predictor variables spanning 
a subspace. Obviously, there are (^) possible k-variable pre- 
diction equations. Therefore, the analyst's selection of the 
p-variables to be used as candidates determines which sub- 
spaces are available for the regression procedure to consider 

2 

in best subset selection. 

Wallenius [Ref. 3] in trying to gather information on the 

unknown distribution of R asks, "How well do the (^) subspaces 

spanned by all the subsets of columns of X 'fill' E^?" (Where 

X is the (n x p) matrix of n-observations on the system's 

p-characteristics . ) In other words, using the candidate pre- 

2 

dictor variables selected by the analyst, will the highest R 
value obtainable through regression differ very much from the 
worst possible? 

Wallenius characterized this problem mathematically by 
defining the 'coefficient of fill' (COF) as follows: 



R . (X,k) 

min 



= min 



max 
1 ' * 



Y (i^ , . . . 1 



> i-(x, , 

k^ ^1 



X 



• X. ) 

^k 



This formula can be interpreted to ask, "If given some set 
of candidate predictor variables, what is the worst Y that 



^The total 
regressed upon 



number of exogenous 
is p . 



variab les 



that Y 



is 
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can be predicted?" Where 'worst' can be identified by the 

2 2 
lowest R value obtainable. Thus, a lower bound for R is 

also obtained by answering this. 

The problem is extremely difficult to solve directly. This 
paper proposes an algorithm for solving this problem using 
optimization with a surrogate objective function. The number 
of optimizations required to obtain a global solution is 
exponentially related to p (the number of predictor variables) . 
Thus, in the area of p = 14 , enumeration begins to become 
economically infeasible as a solution technique, and hence, 
an algorithm is proposed to search the area about some mini- 
mum point. This latter algorithm cannot guarantee a global 
solution, so it must be considered local in nature. Whether 
such a local solution is useful requires further research; 
usefulness may be directly dependent upon the empirical nature 
of the data. 
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II. SUPPORTING CONCEPTS 



A. GENERAL LINEAR REGRESSION USING LEAST SQUARES 

A general and frequently used linear model is the 'multi 
pie linear regression model'. It can be represented as 
follows ; 



V = + StX.. + + ... + S X. + z. . 

0 111 2i2 nin i 



The variable represented by y is the variable of interest 
(i.e., to be predicted). The variables represented by the 
are associated with y and may influence the behavior of y. 

Thus, mathematically y is called the dependent variable 
(endogenous) and the x variables are called independent varia- 
bles (exogenous) . Statistically, this model is referred to 
as the regression of y on the x variables. The coefficients 
3 are referred to as 'partial regression coefficients' and 
they specify the linear functional relationship between the 
independent variables and the dependent variable. Mathe- 
matically, the 3j ^re the partial derivatives of the functional 
relationship 3y/3Xj . Thus, a Sj indicates the change in the 
dependent variable y corresponding to a unit change in the 
independent variable x^ (all other independent variables held 
fixed) . 

There are various criteria used in regression, however the 
formulation of interest for this paper is based upon the least 
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r UUIII- 






squares criteria. Use of least squares to derive the formula 
for estimating Y is shown below. 

Using matrix notation, the regression model can be written 



as : 



Y = X§_ + ^ , 



where 



Y = 




X 



^ ^11 ^12 
1 

1 




/ 



1 



nl 



X 



n2 



X 

nm 



= 


^0 


' i = 


"i" 








^2 




®m+l 




£ 

n 








_ _ 



(a) X is the n x (m+1) matrix of n-observations on 
m-independent (x) variables plus a dummy variable. 

(b) Y is the column vector of the n-observed values of Y. 

(c) £ is the column vector of the m+1 partial regression 
coefficients (whereas, ^ is the vector of estimated 
regression coefficients) . 
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Formulating the least squares 



T 

min £ e 



(Y-XB) (Y-X6) 



taking the derivative with respect to 3 



T T T* T 

|^[Y Y - 2Y X^ + £ X X^] 



0 



and then solving for we get the estimate 
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T -IT 
(X X) X Y . 



B. STANDARDIZATION AND NORMALIZATION OF VECTORS 

The regression coefficients of the linear model are func- 
tions of the units of measurement of the variables. The 
magnitudes of coefficients are influenced by choices of units 
of measurement. Thus, a tantamount scaling problem to that 
experienced in linear programming exists. This scaling prob- 
lem is avoided by use of 'standardized regression coefficients'. 

Standardized regression coefficients are the end result 
when the variables which they are estimated from have been 
transformed to unit variance. 

Consistent with previous notation, and for use further on 
in this paper, an alternative form to obtain 'standardized' 
predictions of the dependent variable y^ for all i (repre- 
sented as Y^) is ; 
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* = 



a. 



(Y - y 1 ) , or Y = 



w Y 



(a) (*) denotes 'standardized'. 

(b) 1 denotes a (n x 1) column vector of ones. 

Of great importance is that a linear transformation has 
been performed, and that the intercept of the regression 
equation is zero (i.e., the equation passes through the ori- 
gin) . Each partial regression coefficient indicates how many 
standard deviation changes in y are associated with one standard 
deviation change in the corresponding x (all other x^ held 
fixed) . Also, a mathematical characteristic is that 



n 



I y. = 0 

i=l ^ 



or, equivalently. 



T 

l^Y* = 0 . 

Mathematically, normalization is a linear transformation 
that takes any given vector and converts its length to unit 
length (length in multidimensional vector spaces discussed 
in the next section) . A vector of unit length is said to have 
a norm = 1. Any arbitrary vector can be transformed into a 
unit vector by dividing it by its norm. 

A vector with unit length can be depicted as follows; 

n 

I y. = 1 , or y\ = 1 . 

i=l ^ 



16 



C. LENGTH, ANGLE, AND COSINE FUNCTION IN MULTIDIMENSIONAL 

VECTOR SPACES 

Using the concept of inner products, length or magnitude 
of a vector can be defined. In this context, the length is 
referred to as the 'norm' of the vector. 

The norm of the arbitrary vector ,X 2 , . . . ,x^) in 
is denoted by 

1 I ( > ^2 ^ * * * ^ ^ n ^ ^ ^ ^ ^ ^1 ^ ^2 ^ * * * ^ ^ ^1 ^ ^2 ^ * * * ^ ^ n ^ * 

In matrix notation. 



|X|| = (x'^X) . 



Thus, a normalized vector would be created as follows: 

- 1 ^ 

- “ - ■ nm ^ V2 • 

To obtain the cosine of the angle between two vectors, 
one normalizes each vector and then takes their inner product 
If u and V are vectors in R^, the cosine of the angle can 
be defined as: 



cos 6 



u 



u 




V 



Two vectors are orthogonal if and only if their inner product 
is zero. This means the angle between them is 90°, cos 9=0 
and V u = 0 . 
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D. PROJECTIONS 



Let u and v be vectors with angle a between them. The 
scalar projection (or component) of v in the direction of u 
is defined to be j |v| j cos a. Geometrically, it can be visu- 
alized as in figure one . ^ 




Figure 1. Scalar Projection of V onto U 
Alternatively, computation is easier if written as: 
scalar orojection of v onto u = v • i i ^ '| i , 

^ i l^M 

r\ 

This is the inner product (dot product in E’ ) of v with the 
unit vector in the direction of u. 



^Note that the dashed line is perpendicular to u. 
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The vector 
unit vector in 
tion of V onto 

proj^v = 



projection is the scalar projection times the 
the direction of u. So, the vector projec- 
u is written: 



V cos a 




(V 




u 

^1 1 ‘ 



19 



III. PROBLEM FORMULATION ONE 



Recalling from Chapter II the expression for estimating 
the regression coefficients, 

T -1 T 

^ = (X X) X Y , 

ip 

there is a unique solution for ^ if the matrix XX is non- 
singular, that is, if it is full rank. This is the case if 

T 

X has n-independent columns, since the columns of X X and 
the rows of X span the same space [Ref. 4] . 

Thus, given a (n x p) matrix X of full rank and n < p, a 
finite dimensional vector space E^ can be defined where n- 
linearly independent column vectors span the space (i.e., form 
a basis) . Extending linear algebra concepts to the linear 
regression model of Section II. A, a geometric interpretation 
is that k columns (where k _< n) of the X matrix span a k- 
dimensional subspace in the n-dimensional space E^ . Further, 
the least squares procedure, through 'best' subset selection, 
will select amongst the (^) columns of X a k-dimensional sub- 
space of E^ to predict the vector of dependent variables (Y) 

2 2 

such that cos 9 = R is maximized (9 is minimized) ; where 9 
is the angle between Y and its orthogonal projection onto a 
candidate subspace. 
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A. RESTATEMENT OF 'COEFFICIENT OF FILL' (COF) 

Consider the matrix X. Assume it has rank = n. Next, 
require that Y, the vector of dependent variables be obtained 
through standardized regression coefficients (see Section II. B) 
that is , 



Y* = r^(Y-Y) . 

Y 

This requirement causes no loss in generality since, 

2 _ 2 

(Xj^, . . . x^) ~ (Xj^ , . . .Xj^) 

Further, by requiring Y to be a unit vector (normalized) , a 
unique Y is specified (i.e., all other vectors in the direction 
of Y will be a scalar multiple of the normalized vector) ; that 
is 



Y*Ty* 



1 . 



The 'coefficient of fill' (defined in Section I.B) now becomes 
[Ref. 3] : 






Min Max 

iTy* = 0 {i^, . . .ij^} 

Y*Ty*=1 




X . 
1 



) • 



K 
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B. TRANSFORMATION OF THE COF TO A QUADRATIC FORM 

2 

Wallenius [Ref. 3] showed that the R portion of the COF 
can be transformed into a quadratic form. Such a transforma- 
tion can be done as follows. 

Identify arbitrarily any combination of the (^) columns 

of X. Let the i^ such combination be called D. . Recall that 

1 

the columns of define a subspace in and any vector in E^ 
can be projected onto that subspace. Assume that has rank k. 




Figure 2. Subspace Spanned by D 



From linear algebra (vector projections) and figure two, it 
follows that 



Z. = proj^ 1=1 
i i=l 

/N /\ 

where X, ,...,X, is an orthonormal basis for D. . 
— 1 — k 1 
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From least squares regression, an alternate calculation 



for Z . is as follows : 
—1 



fp * 

Z . = D. (DTD. ) DTY 

— 1 111 1 — 



Recalling that 







T -IT* 
(DTD.) DTY 
11 1 — 



I 



and that the ’generalized inverse' of a matrix is 



D. 

1 



(D^D.)-^D^ 
11 1 






the calculation of Z. can be stated as: 

—1 



Z . 
— 1 



D . 6 . where 3 • 
1—1 — 1 



- * 

D.Y . 
1 — 



In Section II. C, the cosine of the angle between two vec- 

★ 

tors was defined. To calculate the angle between Y and its 
orthogonal projection , use 

* 

Y • Z, 

Z . 

—1 

Through substitution and simplification, this can be written 
as 



cos 0 . 
1 
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cos 



* — * 
Y -D . D . Y 
— 11 — 



Recalling from Section II. B that Y -Y =1, and squaring both 
2 

sides, cos 9. can be written as follows; 

' 1 



cos 9 . = 

1 



Diii 



— T — * 

Y D.D7D.D.Y 
— 1111— 



■ *i I 2 



Thus, 



o2 2 

= cos I 

— ' i 



Matrix algebra then allows the following conversion for 



— 1 



T 

DTD.D. 

Ill 



- T T 



- T T 

[D.D.D.] = d: 

111 1 



4*:Di 



*'!' T * 

Y D. DTY 
— 11— 



y*(d.dT) Y* 
— 11 — 



Noting that is symmetric, and letting 



B . = D. D. , 

1 11 



we can write 



2 *<p * 

V;D, = I 
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Thus , the COF can be re-expressed as : 



min max Y *^B.Y (COFQ) . 

Y* i ^ 

C. A GAME INTERPRETATION OF THE COFQ 

The COFQ can be viewed as a game between a person and 

nature. The person tries to choose the matrix that will 
2 

maximize R , and nature plays the part of an antagonist who 

wants to create the least favorable Y to be predicted by 

th at B . . 

1 

The game is visualized as depicted in figure three. For 
a fixed Y, the regression 'black box' (labled '!') determines 

/s 

the ^ whose coefficients express Y as a linear combination 

of the 'best' subset of independent variables (columns of 

matrix X) , D . . 

' 1 






R,^=Y’’8;Y ! 

■>- 






Y; 



'--Kr 



8 - 





V 


i 


- 

1 







i 



Figure 3. Game Interpretation of COFQ 
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Nature upon seeing the subspace represented by the matrix 
antagonistically chooses the worst Y in so as to thwart 
the regression model's validity. This is represented in figure 
three as the box labled '2'. 

At first glance, this view of the problem hints that there 

exists an iterative solution to the COFQ. Movement towards 

such a solution might be measured by the generation of a se- 

2 

quence of successively lower values of R for the COFQ. How- 
ever, obtaining an optimal Y is dependent on whether the sequence 
generated is convergent. Using Zangwill's general convergence 
theorem, it can be shown that this process will not generate 
a convergent sequence [Ref. 5]. For every selected, nature 
will find a vector Y orthogonal to that corresponding sub- 
space. It is the case that in a finite dimensional vector 
space, for any subspace with rank less than the vector space 
itself, there exists a vector orthogonal to that subspace. 

This is shown in figure 4 for 3-dimensions, where the subspace 
formed by x^ and x^ is of dimension 2. Vector Y is normal to 
the x^,X 2 , plane. 

D. FORMULATION AS A NON-LINEAR PROGRAMMING PROBLEM (NLP) 

FOR OPTIMIZATION 

In Section III.B, it was shown that the COF has an equiva- 
lent representation using a quadratic form for the objective 
function. The matrix B^^ in the COFQ form is a square symmetri- 
cal matrix of size (n x n) , and has rank k. In general, a 
quadratic function F(x) has the form 
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Figure 4. Y Orthogonal to Subspace 
F{x) = j^GX + C^X + a 

for a constant matrix G, column vector C, and scalar a 
(multiplication by 1/2 is included in the quadratic term to 
avoid the appearance of a factor of two in the derivatives) . 

The quantity 'G' is referred to as the Hessian matrix of F, 
which is the matrix of second partial derivatives. Thus, 
can be characterized as the Hessian matrix of a quadratic form. 
1 . Modeling of the COFQ in Stages 

The solution procedure for the COFQ can be broken down 
into several stages . 

(a) For a particular B. , optimize the quadratic form 

* 

to find the minimum Y . 

(b) Use an algorithm for optimal subset selection to 
look at some regions only and thereby avoid optimizing over 
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all (P) possible B. . This is based upon the reasoning that 
some (along with the associated constraints to be specified 
in the next section) will define convex regions which are 
larger than others and that the larger regions will produce 
a Y which is 'worse' than smaller regions will. Amongst those 

■k 

regions over which optimization was done, select the Y corres- 

2 

ponding to the smallest R value as the global minimum. These 
steps are addressed in the remainder of this chapter. 

2 . Modeling as a NLP 

This section addresses stage (a) ; optimizing with a 

. , . . . * . * 
particular to find the minimum Y . Each optimal Y corres- 
ponding to a B^ is a local optimum for the overall COFQ problem. 

Modeling this stage as a NLP (non-linear programming) 
problem, the objective function becomes : 

*rn * 

min Y B^Y 

From Section III. A, the constraints require Y to be of unit 

length and that Y be obtained through use of standardized 

regression coefficients. Assume the latter is met, hence the 
* 

use of Y in lieu of Y. The unit length requirement was 
stated as 



■*ip * 

Y Y 



1 . 



and the latter constraint was represented by 
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1 Y = 0 . 



The NLP now becomes: 



*>p * 

min Y B . Y 
— 1 — 



*rn * 

s.t. Y Y = 1 



rp * 

l^Y = 0 



i free; i = 1, . . . ,n 



Note that the y. are free variables, otherwise the standardi- 
^ 1 

zation requirement. 



n 



i=l 



0 , 



could not be met . 

The characterization of this NLP is that of a NLP with 

a non-linear equality constraint, referred to in the literature 

as a NEP [Ref. 6]. The constraint is a quadratic form for 

which the Hessian is the identity matrix. In searching for 

* 

the locally optimal Y , the optimization search must move 
along a quadratic surface at unit length from the origin of a 
n-dimensional hypersphere. (Recall from Section II. B that the 
intercept of the regression equation is zero.) This may be 
visualized in 3-dimensions as depicted in figure 5. This 
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Figure 5. Movement of Y along Quadratic Surface 

constraint requirement to move along a quadratic surface 
creates a non- convex NLP ^ which correspondingly increases the 
optimization complexity. Search direction methods for uncon- 
strained nonlinear optimization are nonapplicable. Further, 
reduced-gradient and gradient-projection methods developed for 
problems with nonlinear constraints will probably fail due 
to the non- convexity . 

The requirement to solve many NLPs as the are changed 
necessitates an optimization procedure that is general in nature. 
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IV. PROBLEM FORMULATION TWO 



The inherent difficulty with formulation one prompted a 
search for a computationally simpler model. By investigating 
the geometrical representation of the COF, could a linear 
model be built that, when solved, would solve the original 
problem? If an exact solution to the original problem was 
not reached, could an approximation be found that meets the 
'practical need' for the COF value as expressed in the Intro- 
duction of this paper? 

Such a linear model could utilize the extensively known 
results in linear programming; especially capitalizing on 
the speed of computation using existing linear programming 
algorithms . 

A similar methodology as was used in Section III.D to 
break the COFQ problem into stages, will be used in the solu- 
tion approach to the COF. One stage is optimizing to find the 
* 

minimum Y s locally through the use of a surrogate objective 
function. The second stage is either determining the global 
minimum Y* through enumeration, or selecting the local Y* 
most attractive as the answer (thereby approximately solving 
the COF) 



‘Enumeration seems economically feasible into the neighbor- 
hood of approximately 14 candidate predictor variables. 
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A. GEOMETRICAL INTERPRETATION 
T 

Let a = (a,,..., a ) be a non-zero vector. Consider the 
— 1 n 

vectors X that satisfy 



T 

a X = d 



for some scalar d. The set of X that satisfies this is de- 
fined to be a hyperplane. The vector a is termed the normal 
to the hyperplane, and the normalized vector 

a 

TYEU 



which has Euclidean length unity, is said to be the 'unit 
normal* to the hyperplane. 

One can think of a hyperplane as a shift from the origin 
of the (n-1) -dimensional subspace orthogonal to a [Ref. 7]. 
This can be seen in figure 6. Note that if d = 0 , the 
hyperplane (subspace) passes through the origin. This can 
be seen for three dimensions in figure 7. 

Recall from Section I.B that the column vectors of the 
matrix X (without dummy variable) used in linear regression 
define a n-dimensional finite vector space called . Fur- 
ther recall that combinatorial sets of k such vectors span 
subspaces of E^. If the assumption of linearity is valid for 
the regression model, then interpretation of such subspaces 
as being linear is valid. Thus, let each normalized column 
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Figure 6. Hyperplane in 3-Dimensional Space 




Figure 7. Hyperplanes through Center of a Sphere 
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vector of X represent the 'unit normal' to a (n-1) -dimensional 
hyperplane. Next, view each such hyperplane as a constraint 
of the form^ 



T 

a^X = d; i = 1,2, ...,p 

th 

where i denotes the i^ column of the data matrix of n-obser- 

T 

vations on p-predictor variables, and X = (Xj^ , X 2 , . . . , x^) . 

th 

Thus, the general equation for the i constraint can be 
written as follows; 



a , . X, + a^ . x^ + . 
I 3 1 2j 2 



. + a . X = d 

nj n 



where 



d = 0 for i = 1 , . . . , p . 

Collectively, these p constraints (or hyperplanes) inter- 
sect within the hyperspace and form convex poly topes; specif- 
ically, cones. An elementary example of this is shown in 
figure 8. Any vector in originating at the origin (to 
include Y, the vector of interest) will geometrically lie 
within some such cone as defined by a set of hyperplanes. 



A notation switch has taken place to enable an easy 
transition to commonly used linear programming notation. 

This X does not represent the predictor variables, but rather 
real numbers to be determined. 
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Figure 8. Convex Regions Formed in 2-Dimensions 
B. ALGEBRAIC MODEL 

Consistent with linear programming notation, the p- 
constraints defined by their unit normals can be written as 
follows : 



This system of constraints can be put into the standard 
form AX = b, where b = 0. Matrix A is the matrix of 



aT^x, + + ... + a, X 

11 1 12 2 In n 



0 




0 



a , X, + a ^x„ 
pi 1 p2 2 




0 . 
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coefficients and is actually the transpose of the nomalized 
column vectors from the independent (predictor) variable data 
matrix. 

Using the concepts of halfspaces, specifying _< or _> in 
each relationship above determines on which side of each con- 
straint a point in lies. Putting the system of constraints 
into standard formi will then require a slack (surplus) varia- 
ble for each constraint. 

C. CHARACTERIZATION OF THE CONES 

Since the regression model projects the dependent varia- 
ble Y onto the subspaces (hyperplanes) , we want to know for 
any point (vector) Y which hyperplanes are the closest to Y. 
This can be rephrased as asking, "Within which cone (convex 
polytope) does Y lie?" 

Consider some arbitrary point (i.e., vector originating 
T 

from origin) Xq = ( x^^ , x^ , . . . ,x^) in the hypersphere. Further, 
assume that the points are constrained in the distance they 
can be from the origin, so as to not have an infinite ray. 

Now, scanning out from it can be seen that some hyper- 

planes are 'closer' to ^ than are others ('closest' defined 
by the smallest angle 9^ between Xq and any vector in a 
particular hyperplane) . Define the set of hyperplanes which 
are closest to ^ when considering all directions as 'bounding 
hyperplanes', and the region bounded as the cone within which 
Xq lies; refer to such a cone as 'hole ' . The specific 
sizes and quantity of holes created by the intersecting 
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hyperplanes depends on the values given in the problem data. 
As we scan out from Xq , we know for certain that the closest 
hyperplane is one of the walls. Note, however, that we can- 
not extend this argument to identify the other walls of the 
hole in which lies. Thus, the question remains in identi- 
fying the walls of the hole (bounding hyperplanes) . The 
concepts for reduction of linear inequalities presents an 
answer to this question [Ref. 7] . 

1 . Reduction of Inequalities 

. . . n 

Suppose for any arbitrary point X in E (subject to 
X £ upper bound) all given hyperplanes are examined. Then 
there exist hyperplanes 'exterior' to the convex region 
defining H^. These exterior hyperplanes, when viewed as 
constraints are non-binding. As an elementary example of 
this, a hole is depicted in figure 9 corresponding to the 
following system of constraints. 



T 

a^X 


< 


0 


hyperplane 


Pi 


T 


> 


0 


hyperplane 


P2 


T 


> 


0 


hyperplane 


?3 


L.B. 


< 


X 


£ U.3 . 





As can be seen from figure 9, the third constraint is 
exterior to (the hole identified by the letter i) and can 
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Figure 9 . 2-Dimensional Depiction of Non-Binding p 

be termed 'redundant'. This redundant constraint can be 
eliminated without changing the solution set. 

Now, suppose a convex region is specified by adding 
slack variables, and putting the system of equations in 
standard form . Thus (assuming artificial variables have 
been forced out) , we get; 

a^X + = 0 

T 

"^ 2 — ^2 ~ ^ 

T 

-a^X + s^ = 0 

X ^ |L.B.| 

S > 0 . 
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In standard form, this constraint redundancy is reflected in 
that s^ is a 'non-extremal' variable and hence it, together 
with the third constraint, can be eliminated. This special 
example generalizes to higher dimensional problems. In 
general, redundant inequalities show up* as having non-extremal 
slack variables. 

Here the L.B. is any arbitrary positive real number 
sufficiently big to avoid roundoff problems, yet to prevent 
convergence at the origin. The following algorithm, using 
simplex techniques, will identify the binding constraints 
(walls) for any hole. 

2 . Algorithm: Removal of Redundant Constraints 

Assume that some combination of 'less than or equal 
to' and 'greater than or equal to' inequality signs for the 
p constraints are specified. Call this system of p inequali- 
ties the 'resource constraints' and put it in standard form. 

P 

a. Find a feasible solution: min ) A. , where A. 

i = l ^ ^ 

is the artificial variable corresponding to the 

. th ^ , 6 

1 constraint. 

b. Let S = {s.,s^,...,s }; the set of slack varia- 

12 p 

bles . 

c. Select s. from S. 

1 

g 

Failure to find a feasible solution for that combination 
of signs implies that a convex region is not defined, 

and therefore that combination can be ignored. 
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d. Min s^, subject to resource constraints and 
variable bounds. 

e. If s^ is non-extremal (s^ > 0) , place index i 
in set R. 

f. If set S has been exhaustively examined, go to 
Step g; otherwise increment i and go to Step c. 

g. Remove or 'fix' all constraints i s.t. i e {R}. 

The result of this is that the boundaries of the 

convex region within which a specified point (vector) lies 
have been identified. Any optimization need only consider 
this subset of the resource constraints. 

D. OPTIMIZATION FOR LOCAL MINIMA 

2 

The COF seeks a Y such that R is minimized. For a given 

^ and the corresponding cone within which ^ lies, this 

minimum is approached as Y moves away from the nearest hyper- 

2 

plane. Recall that as the angle of projection increases, R 

decreases. However, past a certain position, the angle of 

projection between Y and a different hyperplane will decrease 

enough such that the regression will select this second hyper- 

2 

plane to project Y onto instead, since a higher R value can 
be obtained. 

If this process were to occur between Y and each of the 
walls of the hole simultaneously, an 'equilibrium point' for 
Y would be reached. This equilibrium point is defined to be 
the center of the hole. This can be visualized in 3-dimensions 
in figure 10 . 
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Figure 10. Vector Y at Equilibrium Point 

It is at this equilibrium point where if Y were projected 
onto any of the walls, that the angles of projection would be 
the 'worst' ; we are maximizing the minimum angle. The Y 
representing this point, and normalized to be unique, is the 
worst Y to be predicted using regression for this region of 
• 

The search for a local minimum Y thus becomes a search 
for a vector that originates at the origin and passes through 
the center of the hole. Considering the hole in the context 
of a convex region, we are searching not for a solution at an 
extreme point, but rather at the center of the region. 
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viewing the previous figure from 'topside', we see Y as 
a point in the center of a (n-1) -dimensional convex region. 




Figure 11. Top View of 2-Dimensional Convex Region 

Such an equilibrium point can be approximated by optimizing 
using slack associated with the extremal constraints repre- 
senting the hole. The objective function for optimization 
can be formulated as 

T 

max [min C.X] , 

X i ^ 

where C. is included for all i ft {R}. Note each C. contains 
—1 —1 

only one non-zero coef ficient--that coefficient representing 

th 

the slack variable for the i resource constraint. The final 
X obtained is the optimal value of Y. 

T 

The original form of the COF required Y Y = 1 (i.e., unit 
length) , and that for the COFQ this requirement created a 
non-convex NLP. In figure 12 for 3-dimensions, it can be seen 
that there exists a hyperplane 'tangent' to this quadratic 
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Figure 12. Hyperplane Touching Surface at One Point 



surface such that the closest point on the hyperplane to the 

* * 

origin is the optimal Y . Then the vector Y is the orthogonal 
unit normal to the hyperplane. 

/N /N 

Now, let the initial value of Y (Y is normalized) specify 
a bounding hyperplane to the hole. For each new value of Y 
a new bounding hyperplane can be specified, thereby linearly 
approximating the quadratic surface in successive increments 
as Y moves towards the center of the hole (optimal Y) . 

Using matrix A with the understanding that it contains only 
the extremal resource constraints , the optimization problem 
can be formulated as 



max [min Ctx] 
X i ^ 



s.t. AX + IS, 
— —A 



< 1 



0 



> 0 



X free 

where a = values of X for the (k-1) iteration, and Y is 

represented by the current value of X. For each hole, the 

initial Y is the normalized value of the BFS to the L.P. 

which was reached using the algorithm in Section C.2. The 
^ th 

values of Y for the (k-1) iteration become the coefficients 
k ^ In 

of a (k iteration) . Note that Y is normalized between each 
iteration. The quantity is an identity matrix and is 

the column vector of slack variables corresponding to the 
extremal resource constraints (hence the use of subscript A) . 
This problem can be rewritten in final form as the L.P.: 

max Z 

s.t. AX + IS^ = 0 

— —A 

Z < C^X i / {R} 




S >0 
-A - 

X free. 

E. OPTIMIZATION FOR A GLOBAL MINIMUM 

Recall from Section IV. 3 that the direction of the inequality 
signs for the resource constraints determines which convex 
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set is the feasible region. We can see that there are 2^ 
such sets to consider for local optimization in the search 
for a global minimum to the COF. 

When the number of candidate predictor variables is small, 
say up to about p = 14, enumeration is economically practical. 
For problems with p larger than this, only holes which look 
larger than most of the other holes could be optimized as 
'candidates' for the global minimum. Such a selective opti- 
mization would require some type of global information prior 
to any optimization. A procedure to find such information 
was not found. This impasse led to the algorithm presented 
below. This algorithm, when given a local optimum, searches 
adjacent holes for a better minimum. If none better is found, 
it stops; otherwise, it uses the best hole and continues to 
search from there. In theory it might search all 2^ holes, 
but this is doubtful for real world problems (also, an itera- 
tion stop could be put in if desired) . Note that the solution 
is still local in nature, although it may be the global solu- 
tion. A heuristic approach may have to be developed to decide 
which hole to use to begin such a search. 

1. Algorithm: Searching the Neighborhood about Y Min 

An algorithm to search the neighborhood about a local 
minimum in an effort to find the global solution, and if not, 
to find a better minimum (larger Z value) , is as follows: 

a. Select some combination of inequality signs (£^^) 
for the resource constraints. 
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b. Get a BFS (basic feasible solution ) , remove 
redundant constraints, and then maximize Z 
(Z defined in previous section) . 

c. Record extremal constraints as set E (along with 
the direction of the inequalities) , and record 

•jif 

the value of X and Z. 
til. 

d. For the k constraint of E, reverse the inequality 
(multiply the constraint by -1) . Let J = {k^^ 
constraint, reversed} u {all resource constraints - 
k }. Optimize over J as in Step b. If Z^ > Z, 
record as in Step c (use label other than E) , 

and set ’NEXT’ = k. 

e. If all elements of set E have been exhaustively 
examined, go to Step f; otherwise, increment k 
and go to Step d. 

f. If no Z for set E is better than the initial Z, 
STOP. Otherwise, let J = (constraint ’NEXT’} 

u (all resource constraints -’NEXT’, reversed}. 

Fix ’NEXT’ so as to not reverse its inequality 
sign again. Go to Step b. 

This results in a minimum that although is local, 
can be considered the best solution possible in that region 
of the hypersphere. 

Using the algorithms presented and the solution ob- 

2 

tained, the analyst can compute the corresponding R value 
by forcing the regression or using the equations in Section 
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III.B. Thus, he now has a relative measure of his model's 

2 

applicability through comparison of the R value pertaining 

to his prediction equation (obtained through least squares 

2 

regression ) , to the lowest R obtainable for that set of 
candidate predictor variables. 
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V. SUMMARY 



2 

The empirical model builder, in utilizing R for a measure 

of 'goodness of fit', needs information concerning the quality 

of this statistic. This paper has addressed this problem 

utilizing optimization techniques to help the model builder 

2 

assess the amount of confidence that can be placed in a R 
value pertaining to a particular set of candidate predictor 
variables . 

The linear programming algorithms presented offer a 

practical, fast, and cost effective methodology to search 

the hyperspace in which the linear regression model will 

2 

operate. The lowest value of R (globally) achievable for a 
particular set of cost data can be found when the number (p) 
of candidate predictor variables is small. Whereas, when 
the number of variables nears fourteen or more, a local mini- 
mum must be accepted due to computational costs inherent in 
the presented methodology. 
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